ResilientStore: A Heuristic-Based Data Format Selector for Intermediate Results

نویسندگان

  • Rana Faisal Munir
  • Oscar Romero
  • Alberto Abelló
  • Besim Bilalli
  • Maik Thiele
  • Wolfgang Lehner
چکیده

Large-scale data analysis is an important activity in many organizations that typically requires the deployment of data-intensive workflows. As data is processed these workflows generate large intermediate results, which are typically pipelined from one operator to the following. However, if materialized, these results become reusable, hence, subsequent workflows need not recompute them. There are already many solutions that materialize intermediate results but all of them assume a fixed data format. A fixed format, however, may not be the optimal one for every situation. For example, it is well-known that different data fragmentation strategies (e.g., horizontal and vertical) behave better or worse according to the access patterns of the subsequent operations. In this paper, we present ResilientStore, which assists on selecting the most appropriate data format for materializing intermediate results. Given a workflow and a set of materialization points, it uses rule-based heuristics to choose the best storage data format based on subsequent access patterns. We have implemented ResilientStore for HDFS and three different data formats: SequenceFile, Parquet and Avro. Experimental results show that our solution gives 18% better performance than any solution based on a single fixed format.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatically Generating Back Ends Using Declarative Machine Descriptions

Despite years of work on retargetable compilers, creating a good, reliable back end for an optimizing compiler still entails a lot of hard work. Moreover, a critical component of the back end—the instruction selector—must be written by a person who is expert in both the compiler’s intermediate code and the target machine’s instruction set. By generating the instruction selector from declarative...

متن کامل

Statement João Dias December , 2009

Despite years of work on retargetable compilers, creating a good, reliable back end for an optimizing compiler still entails a lot of hard work. Moreover, a critical component of the back end—the instruction selector—must be written by a person who is expert in both the compiler’s intermediate code and the target machine’s instruction set. By generating the instruction selector from declarative...

متن کامل

Case-Based Reasoning as a Heuristic Selector in a Hyper-Heuristic for Course Timetabling Problems

This paper studies Knowledge Discovery (KD) using Tabu Search and Hill Climbing within Case-Based Reasoning (CBR) as a hyper-heuristic method for course timetabling problems. The aim of the hyper-heuristic is to choose the best heuristic(s) for given timetabling problems according to the knowledge stored in the case base. KD in CBR is a 2-stage iterative process on both case representation and ...

متن کامل

a swift heuristic algorithm base on data mining approach for the Periodic Vehicle Routing Problem: data mining approach

periodic vehicle routing problem focuses on establishing a plan of visits to clients over a given time horizon so as to satisfy some service level while optimizing the routes used in each time period. This paper presents a new effective heuristic algorithm based on data mining tools for periodic vehicle routing problem (PVRP). The related results of proposed algorithm are compared with the resu...

متن کامل

Optimized computational Afin image algorithm using combination of update coefficients and wavelet packet conversion

Updating Optimal Coefficients and Selected Observations Affine Projection is an effective way to reduce the computational and power consumption of this algorithm in the application of adaptive filters. On the other hand, the calculation of this algorithm can be reduced by using subbands and applying the concept of filtering the Set-Membership in each subband. Considering these concepts, the fir...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016